53 research outputs found
Recovering Fine Details for Neural Implicit Surface Reconstruction
Recent works on implicit neural representations have made significant
strides. Learning implicit neural surfaces using volume rendering has gained
popularity in multi-view reconstruction without 3D supervision. However,
accurately recovering fine details is still challenging, due to the underlying
ambiguity of geometry and appearance representation. In this paper, we present
D-NeuS, a volume rendering-base neural implicit surface reconstruction method
capable to recover fine geometry details, which extends NeuS by two additional
loss functions targeting enhanced reconstruction quality. First, we encourage
the rendered surface points from alpha compositing to have zero signed distance
values, alleviating the geometry bias arising from transforming SDF to density
for volume rendering. Second, we impose multi-view feature consistency on the
surface points, derived by interpolating SDF zero-crossings from sampled points
along rays. Extensive quantitative and qualitative results demonstrate that our
method reconstructs high-accuracy surfaces with details, and outperforms the
state of the art
RUSHES—an annotation and retrieval engine for multimedia semantic units
Multimedia analysis and reuse of raw un-edited audio visual content known as rushes is gaining acceptance by a large number of research labs and companies. A set of research projects are considering multimedia indexing, annotation, search and retrieval in the context of European funded research, but only the FP6 project RUSHES is focusing on automatic semantic annotation, indexing and retrieval of raw and un-edited audio-visual content. Even professional content creators and providers as well as home-users are dealing with this type of content and therefore novel technologies for semantic search and retrieval are required. In this paper, we present a summary of the most relevant achievements of the RUSHES project, focusing on specific approaches for automatic annotation as well as the main
features of the final RUSHES search engine
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Supporting linguistic research using generic automatic audio/video analysis
Automatic analysis can speed up the annotation process and free up human resources, which can then be spent on theorizing instead of tedious annotation tasks. We will describe selected automatic tools that support the most time-consuming steps in annotation, such as speech and speaker segmentation, time alignment of existing transcripts, automatic scene analysis with respect to camera motion, face/person detection, and the tracking of head and hands as well as the resulting gesture analysis.National Foreign Language Resource Cente
Image-based rendering for teleconference systems
To obtain an image-based immersive presence in a virtual world, two important factors should be considered: system configuration and multiview representation. We present two non-adversary system configurations. The first is the well-known convergent wide-baseline set-up while the second is a unique proposal under investigation at our institute, which is based around a parallel multiple narrow-baseline camera set-up. In the domain of multiview representation we introduce two non-conflicting representations that can be implemented independent of the chosen system configuration, dependent on whether compression or scalability is important to the overall system. We then discuss our implementation of an image-based rendering system for an immersive teleconferencing application where three conferees meet around a shared virtual table. The system uses a wide-baseline configuration with two stereo camera pairs capturing the reference images. The system is designed to deal with hand gestures as well as the synthesis of areas occluded in one or more of the reference images but required in the derived view. We introduce the notion of a confidence map designed to indicate, for the derived image, which reference image should provide the required texture and disparity information for a surface
ACM multimedia 2010 workshop on 3D video processing
Research on 3D video processing has gained a tremendous amount of momentum due to advances in video communications, broadcasting and entertainment technology (e.g., animation blockbusters like Avatar and Up). There is an increasing need for reliable technologies capable of visualizing 3-D content from viewpoints decided by the user; the 2010 football World Cup in South Africa has made very evident the need to replay crucial football footage from new viewpoints to decide whether the ball has or has not crossed the goal line. Remote videoconferencing prototypes are introducing a sense of presence into large- and small-scale (PC-based) systems alike by manipulating single and multiple video sequences to improve eye contact and place participants in convincing virtual spaces. All this, and more, is pushing the introduction of 3D services and the development of high-quality 3D displays to be available in a future which is drawing nearer and nearer
IMAGE-BASED RENDERING FOR TELECONFERENCE SYSTEMS
To obtain an image-based immersive presence in a virtual world, two important factors should be considered: system configuration and multiview representation. We present two non-adversary system configurations. The first is the well-known convergent wide-baseline set-up while the second is a unique proposal under investigation at our institute, which is based around a parallel multiple narrow-baseline camera set-up. In the domain of multiview representation we introduce two non-conflicting representations that can be implemented independent of the chosen system configuration, dependent on whether compression or scalability is important to the overall system. We then discuss our implementation of an image-based rendering system for an immersive teleconferencing application where three conferees meet around a shared virtual table. The system uses a wide-baseline configuration with two stereo camera pairs capturing the reference images. The system is designed to deal with hand gestures as well as the synthesis of areas occluded in one or more of the reference images but required in the derived view. We introduce the notion of a confidence map designed to indicate, for the derived image, which reference image should provide the required texture and disparity information for a surface
MULTIPLE NARROW-BASELINE SYSTEM FOR IMMERSIVE TELECONFERENCING
Abstract: An important aim of immersive teleconferencing systems is to create realistic 3D virtual views of remote conferees. Hence, systems should be able to deal with hand gestures as well as occluded areas in reference images required in derived views. The quality of such derived views is dependent not only on the analysis and synthesis process but also the multiview camera set-up. Often the popular convergent wide-baseline stereo approach aspires to achieve too much through a single camera pair: maximum information and reliable disparity maps. We identify how this dichotomy leads to problems in the analysis and synthesis process, often leading to a restrictive system specific solution. We then define a new approach, a multiple narrow-baseline set-up, designed to overcome the limitations of the wide-baseline set-up, being modular, both in terms of system requirements as well as algorithmically, and scalable, with respect to the number of conferees. Key words: multiple narrow-baseline, immersive teleconferencing, confidence map. 1
- …